Traffic-Aware Geo-Distributed Big Data Analytics with Predictable Job Completion Time
نویسندگان
چکیده
منابع مشابه
Bohr: Similarity Aware Geo-distributed Data Analytics
We propose Bohr, a similarity aware geo-distributed data analytics system that minimizes query completion time. The key idea is to exploit similarity between data in different data centers (DCs), and transfer similar data from the bottleneck DC to other sites with more WAN bandwidth. Though these sites have more input data to process, these data are more similar and can be more efficiently aggr...
متن کاملPingAn: An Insurance Scheme for Job Acceleration in Geo-distributed Big Data Analytics System
Geo-distributed data analysis in a cloud-edge system is emerging as a daily demand. Out of saving time in wide area data transfer, some tasks are dispersed to the edge clusters satisfying data locality. However, execution in the edge clusters is less well, due to limited resource, overload interference and cluster-level unreachable troubles, which obstructs the guarantee on the speed and comple...
متن کاملTensor Completion Algorithms in Big Data Analytics
Tensor completion is a problem of lling the missing or unobserved entries of partially observed tensors. Due to the multidimensional character of tensors in describing complex datasets, tensor completion algorithms and their applications have received wide aention and achievement in data mining, computer vision, signal processing, and neuroscience, etc. In this survey, we provide a modern ove...
متن کاملTowards Privacy aware Big Data analytics
Big Data platforms allow the integration and analysis of high volumes of data with heterogeneous format from different sources. Big Data analytics support the derivation of properties and correlations among data and are considered by companies a key asset to make business decisions. The analyzed data often include personal and sensitive information, thus the analysis implies threats to privacy,...
متن کاملAn Algebra for Distributed Big Data Analytics
We present an algebra for data-intensive scalable computing based on monoid homomorphisms that consists of a small set of operations that capture most features supported by current domain-specific languages for data-centric distributed computing. This algebra is being used as the formal basis of MRQL, which is a query processing and optimization system for large-scale distributed data analysis....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Parallel and Distributed Systems
سال: 2017
ISSN: 1045-9219
DOI: 10.1109/tpds.2016.2626285